Diagnostic performance of deep learning-based automatic white matter hyperintensity segmentation for classification of the Fazekas scale and differentiation of subcortical vascular dementia

Purpose To validate the diagnostic performance of commercially available, deep learning-based automatic white matter hyperintensity (WMH) segmentation algorithm for classifying the grades of the Fazekas scale and differentiating subcortical vascular dementia. Methods This retrospective, observational, single-institution study investigated the diagnostic performance of a deep learning-based automatic WMH volume segmentation to classify the grades of the Fazekas scale and differentiate subcortical vascular dementia. The VUNO Med-DeepBrain was used for the WMH segmentation system. The system for segmentation of WMH was designed with convolutional neural networks, in which the input image was comprised of a pre-processed axial FLAIR image, and the output was a segmented WMH mask and its volume. Patients presented with memory complaint between March 2017 and June 2018 were included and were split into training (March 2017–March 2018, n = 596) and internal validation test set (April 2018–June 2018, n = 204). Results Optimal cut-off values to categorize WMH volume as normal vs. mild/moderate/severe, normal/mild vs. moderate/severe, and normal/mild/moderate vs. severe were 3.4 mL, 9.6 mL, and 17.1 mL, respectively, and the AUC were 0.921, 0.956 and 0.960, respectively. When differentiating normal/mild vs. moderate/severe using WMH volume in the test set, sensitivity, specificity, and accuracy were 96.4%, 89.9%, and 91.7%, respectively. For distinguishing subcortical vascular dementia from others using WMH volume, sensitivity, specificity, and accuracy were 83.3%, 84.3%, and 84.3%, respectively. Conclusion Deep learning-based automatic WMH segmentation may be an accurate and promising method for classifying the grades of the Fazekas scale and differentiating subcortical vascular dementia.


Introduction
White matter hyperintensity (WMH) is known to represent accumulated tissue damage of white matter and is detected on T2-weighted images and Fluid Attenuated Inversion Recovery (FLAIR) images showing hyperintense signal intensities [1,2]. It is associated with executive dysfunction regardless of its location [3], and a potential association exists between WMH and diverse diseases, including cerebrovascular, as well as other neurological or psychiatric diseases [4][5][6][7]. Cognitive impairment is also associated with WMH [8,9].
The volume of WMH is thought to represent the size of cerebral small vessel disease burden, and a larger volume of WMH is reported to have an association with poor functional outcome or treatment response in patients with acute ischemic stroke [10][11][12][13]. For volume estimation, visual assessment has been mainly performed such as the Fazekas scale [14] in clinical practice, which is also easy and quick to perform. However, reliable assessment is difficult and sufficient experience is needed owing to the inherent heterogeneous nature in size, number, shape, and location of WMH, which increases with age and cannot be objective [15]. These problems remain in manual segmentation of WMH, which is time-consuming and labor-intensive.
In recent years, automated methods for measuring the volume of WMH were tried using various methods of deep learning [16,17], and convolutional neural networks (CNN) have become one of the main methods with variable performance [18][19][20][21][22]. Park et al. [22] used a U-Net model adopting a multi-scale approach that obtains significant features from intermediate decoder layers to segment WMH. Wu et al. [21] developed a skip connection U-Net that adds skip connections to standard U-Net and achieved faster convergence speed and improved segmentation performance. The VUNO Med-DeepBrain is one of the commercially available segmentation models, which employed a 2D U-Net using only T2-FLAIR MRI for development and focused on treating highly unbalanced WMH labels, mainly owing to deep WMH by applying generalized dice loss [23]. The difficulty of deep WMH underestimation was also suggested in another article [24] that utilized domain adaptation methods to develop a robust segmentation model regardless of scanner or sequence types.
Meanwhile, subcortical vascular dementia (or subcortical ischemic vascular dementia) is a major form of vascular dementia that results from complete or incomplete infarction of cerebral subcortical structures due to small artery disease [25]. The main clinical features include cognitive impairment [26], which is related to disruption of the prefrontal-subcortical circuit by the accumulation of lacunar infarctions and white matter ischemia [27,28] and interruption of cholinergic pathways traversing the subcortical white matter [1,29]. Cholinergic pathway comprise neurons regulating synthesis and release of acetylcholine, which is a ubiquitous neurotransmitter having a central role in neurotransmission and cognition [30]. One of the diagnostic criteria is severe WMH on T2-weighted and FLAIR images [31].
In this study, we aimed to evaluate the diagnostic performance of a deep learning-based automatic WMH segmentation algorithm for classifying grades of the Fazekas scale and differentiating subcortical vascular dementia, which has not been evaluated yet.

Patient inclusion
All data were fully anonymized before access, and the institutional review boards (IRB) approved this observational study of a single tertiary referral hospital (IRB No. 2020-1352). The requirement for informed consent was waived given the retrospective design. A computerized search of electronic medical records was performed to identify consecutive patients attending the outpatient neurologic memory clinic to evaluate memory impairment between March 2017 and June 2018. To validate a segmentation model, the dataset was split into the training set (March 2017-March 2018, n = 596) and internal validation test set (April 2018-June 2018, n = 204). The eligibility criteria included patients who: (a) presented with memory complaints, (b) underwent brain MRI as part of their initial evaluation, (c) had available electronic medical records, (d) did not have a previous history of the cerebral infarct, and (e) did not have other white matter abnormality such as metabolic encephalopathy or postoperative change. Brain MRI was performed within 1 month of presentation.

Brain MRI protocol
Brain MRI was performed on 3.0 T units (Ingenia, Philips Medical Systems, Best, the Netherlands) using a 32-channel sensitivity-encoding head coil. High-resolution anatomical threedimensional (3D) volume images were acquired using a 3D gradient-echo T1-weighted sequence in the sagittal plane. The detailed parameters were as follows: repetition time (TR), 9.6 ms; an echo time (TE), 4.6 ms; a flip angle, 8˚; a field of view (FOV), 224 mm × 224 mm; slice thickness, 1 mm with no gap; and a matrix size, 224 × 224. Two-dimensional (2D) axial FLAIR images were obtained as follows: TR, 9000 ms; TE, 125 ms; FOV, 220 mm × 220 mm; slice thickness, 4 mm with no gap; and a matrix size, 256 × 256.

MRI analysis and reference standards
The FLAIR images were reviewed in consensus by two neuroradiologists with 10 and 31 years of neuroradiology experience, respectively. They were blinded to all the clinical information. WMH was categorized as normal, mild, moderate, or severe according to the Fazekas scale in both the periventricular and deep white matter [3]. Periventricular WMH was graded as follows: (a) normal, absence; (b) mild, "caps" or pencil-thin lining; (c) moderate, smooth "halo"; and (d) severe, irregular periventricular hyperintensity extending into the deep white matter [3]. Deep WMH was graded as follows: (a) normal, absence; (b) mild, punctate foci; (c) moderate, beginning confluence of foci; and (d) severe, large confluent areas [3]. In case of discordance, a consensus was made by two neuroradiologists for each periventricular WMH and deep WMH. In addition, the representative grade of WMH for each case was determined as maximal grading of either periventricular or deep white matter. Sensitivity was defined as the proportion of a test positive (mild/moderate/severe, moderate/severe or severe from Deep-Brain), conditioned on a true positive (mild/moderate/severe, moderate/severe or severe, based on a consensus by two neuroradiologists). Interobserver agreement between the two neuroradiologists was determined for all patients.
The patients who fulfilled the following criteria were diagnosed with subcortical vascular dementia by clinicians [31]: (a) met the criteria for Vascular dementia by the Diagnostic and Statistical Manual of Mental Disorders-fourth edition [32]; (b) had at least one focal neurologic sign on the Focal Neurologic Sign Score evaluation [31]; (c) had severe WMH on T2-weighted and FLAIR images, defined as periventricular WMH with a cap or rim >10 mm in maximum diameter, deep WMH with extensive or diffusely confluent form �25 mm in maximum diameter [31]. Patients with images that showed territorial, or watershed infarctions were excluded. All relevant data are within the manuscript and its Supporting Information files (S1 Appendix).

Deep learning-based WMH segmentation algorithm
The VUNO Med-DeepBrain (version 1.0.1, VUNO Inc., Seoul, South Korea), which has been commercially available in South Korea since June 2019 and Europe since June 2020, was used for the deep learning-based segmentation system. Deep learning algorithms can provide the segmentation of detailed regions of the T1-weighted brain MRI and WMH regions of the FLAIR image. The software extracts the segmented 104 brain regions masked image whose space, direction, orientation information equal to the original 3D T1 MR Image. The total white matter volume is calculated using space information and the number of the pixel of the total segmented white matter region.
The system for segmentation of WMH was designed with CNN, in which the input image was comprised of a pre-processed FLAIR image, and the output was a segmented WMH mask and its volume (Fig 1). Brain parenchyma were extracted from the FLAIR images using the inhouse brain extraction tool with a rigid transformation for template matching and a 3D UNet deep learning model. The UNet architecture is an end-to-end CNN that conducts high-performance biomedical image segmentation [33]. The UNet architecture contains two modules: encoder and decoder. The encoder modules are a classic stack of CNN and max pooling layers that analyze the context that represents the implicit features of the input image. Whereas the decoder modules are a symmetric stack of transposed CNN that perform localization. The UNet applies a skip connection between the encoder modules' feature and decoder modules' feature for improved localization performance. UNet is widely used, primary in medical image segmentation, because of its high performance. The 3D UNet architecture consists of 4 encoder blocks and 3 decoder blocks using 3D convolution blocks (3D-conv block), 3D up convolution blocks (3D-upconv block) and a 1x1 3D convolution layer. The 3D-conv block sequentially includes 3D convolution-3D batch normalization-Rectified Linear Unit (ReLU) activation-3D convolution-3D batch normalization-ReLU activation. First 3D-conv block has padding 3 and kernel size 5 for the convolution layers, otherwise has padding 1 and kernel size 3. For the encoder, we perform 3D-conv block calculation and then conduct a 3D maxpooling layer (kernel size 2, stride 2) to re-duce an image dimension. For the decoder, we act 3D-upconv block (3D convolution Transpose-3D batch normalization-ReLU activation) and then perform the 3D-conv block. Finally, the 1x1 3D convolution layer is applied to produce an output mask. This model has [8,16,32,64] channels for the encoder blocks [8,16,32], channels for the de-coder blocks, and a final layer has a one channel. A WMH segmentation model has a similar architecture to the 3D UNet, however, there are some differences. This model is based on 2D layers (2D convolution, 2D convolution Transpose, 2D batch normalization, 2D max-pooling layers) and consists of 5 encoder blocks and 4 decoder blocks. It has [16,32,64,128,256] channels for the encoder blocks [128, 64, 16,32], channels for the decoder blocks. The WMH segmentation model was developed for effective segmentation for large masks such as periventricular WMH and small masks such as deep WMH [33]. The pre-processed image passed through the CNN to generate a segmentation mask. The overall processing time is 2 minutes per case. Fig 2 shows three examples of a pre-processed FLAIR image and a mask of segmented WMH, with total WMH volume of each case classified according to Fazekas. This deep learning-based segmentation system was trained using ADAM optimizer with learning rate 1e-4, 32 batch size, 100 epochs, generalized dice loss function. Elastic deformation, mirror transformation was used for the image augmentation technique. The data regarding the segmentation model (DeepBrain), which is presented in this study, are owned by VUNO Inc.

Statistical analyses
The primary outcome was the diagnostic performance of automatic WMH volume and volume ratio for classifying the grades of Fazekas scale with its optimal cut-off values. WMH volume ratio was defined as automatic WMH volume/total white matter volume × 100. One-way analysis of variance was done among groups with different Fazekas scales in both training and test sets. The interobserver agreement of the Fazekas scale for each periventricular WMH and deep WMH between two neuroradiologists was calculated using weighted Kappa. Optimal cut-off values for determining the Fazekas scale were obtained from receiver operating The deep learning-based WMH segmentation algorithm is consisting of two independent processes. First, brain extraction is conducted with two rigid transformation and in-house brain extraction algorithm using 3D U-Net. Second, in-house convolutional neural networks segment WMH from the preprocessed brain parenchyma image.
https://doi.org/10.1371/journal.pone.0274562.g001 characteristic (ROC) curves, with the sensitivity, specificity, and area under each curve (AUC) calculated using the Youden index [34], defined as sensitivity + specificity-1 (values ranged from -1 to +1). The secondary outcome was the diagnostic performance of automatic WMH volume and volume ratio for differentiating subcortical vascular dementia with its optimal cut-off values. All statistical analyses were performed using MedCalc version 18.6 (MedCalc Software, Ostend, Belgium), with P values < 0.05 defined as statistically significant. Meanwhile, we obtained a Dice similarity coefficient score using our test set by manual segmentation to evaluate the competence of the model.

Patient demographics
Between March 2017 and June 2018, 830 patients attended the outpatient clinic for memory complaints. Patients with previous cerebral infarct (n = 16), white matter abnormality (n = 3), no FLAIR examination (n = 3), poor image quality (n = 3), meningioma (n = 2), trauma (n = 1), normal pressure hydrocephalus (n = 1), and Creutzfeldt-Jakob disease (n = 1) were excluded, leaving the remaining 800 consecutive patients who were included in this analysis (Fig 3). The mean age ± standard deviation (SD) of the training and test sets was 69. Among the 596 patients in the training set, 28 (4.7%) were diagnosed with subcortical vascular dementia. The mean age ± SD was 76.7 years ± 6.3, and 15 patients were women. The mean duration of education in years ± SD was 9.4 ± 5.2, the mean MMSE score ± SD was 18.7 ± 5.3, and the mean Clinical Dementia Rating (CDR) was 0.9 ± 0.3. The mean total white matter volume ± SD was 392.0 mL ± 56.0. Among the 204 patients in the test set, 6 (2.9%) were diagnosed with subcortical vascular dementia. The mean age ± SD was 77.3 years ± 7.0, and 6 patients were women. The mean duration of education in years ± SD was 5.8 ± 3.0, the mean MMSE score ± SD was 18.2 ± 4.4, and the mean CDR was 1.0 ± 0.5. The mean total white matter volume ± SD was 380.0 mL ± 29.7.

Diagnostic performance for classifying Fazekas scale
The diagnostic performance of WMH volume and volume ratio for classifying the grades of the Fazekas scale with the cut-off values are described in Table 2. When differentiating normal/mild vs. moderate/severe using WMH volume in the test set (n = 204), sensitivity,

PLOS ONE
Diagnostic performance of deep learning-based automatic WMH segmentation and subcortical vascular dementia

PLOS ONE
Diagnostic performance of deep learning-based automatic WMH segmentation and subcortical vascular dementia

Diagnostic performance for differentiating subcortical vascular dementia
Among the 204 patients of the test set, 2.9% (6/204) were diagnosed as subcortical vascular dementia with (n = 4) or without combined Alzheimer's disease (n = 2). For distinguishing subcortical vascular dementia from others using WMH volume, sensitivity, specificity, and accuracy were 83.3% ( ). Therefore, the deep learning-based automatic WMH segmentation algorithm may be an accurate and promising method for classifying the grades of the Fazekas scale and differentiating subcortical vascular dementia. The advantage of using the Fazekas scale is the easy perception of WMH burden because it is the most prevalent method for grading WMH lesions. However, it is a qualitative method and limited to accurately evaluating WMH volume using only 4 grades. Numerous attempts have been done for quantitative measurement of WMH using deep learning [16]. There was a MICCAI WMH segmentation challenge [35] in which the winner used a 2D U-Net model that applied an ensemble method and achieved a 0.80 dice similarity coefficient. DeepBrain, which also employed a 2D U-Net, focused on treating highly unbalanced WMH labels by applying generalized dice loss, and our model used only T2-FLAIR MRI for developing a WMH segmentation model. DeepBrain showed relatively high performance by achieving a 0.746 dice similarity coefficient score (median value) for our test set. However, direct comparison with the winner of the challenge model was not possible because the external test set of the competition has not been released.
We speculated that if suitable volume cut-off values can be evaluated with acceptable diagnostic performance based on a well-developed automatic volumetry, an automated and reliable Fazekas scale can be determined. We expect this would help the better perception of WMH burden in both quantitative and qualitative ways; moreover, it would facilitate followup of the burden of WMH. One study investigated the diagnostic performance of the automatic WMH segmentation method in distinguishing high Fazekas grades (score = 2 or 3) and low Fazekas grades (score = 0 or 1) using the Lesion Segmentation Tool software [36]. The AUC value from that study was 0.93, whereas our results showed that AUC was 0.981 (95% CI: 0.951-0.995), with the cut-off being 9.6 mL for differentiating normal/mild vs. moderate/ severe. However, our study provided additional diagnostic performance with optimal cut-off values for classifying normal and severe Fazekas grades. We also evaluated the WMH volume ratio. Moreover, our study noted the diagnostic performance of WMH in differentiating subcortical vascular dementia from other subtypes of dementia, which had not been evaluated previously.
Subcortical vascular dementia is a type of vascular dementia which is the second most common cause of dementia following Alzheimer's disease (AD) [37]. Two-thirds of patients with subcortical vascular dementia have pathologic features of AD, and one-third of patients with AD have vascular pathology, which implies that subcortical vascular dementia and AD may overlap pathologically [38][39][40][41]. However, the existence of pure subcortical vascular dementia was discovered based on the amyloid imaging, which represents significant extent of subcortical white matter ischemic changes without evidence of amyloid plaque deposition in the brain [42]. Apart from mixed type with AD, pure subcortical vascular dementia demonstrates less difficulties in verbal/visual memory-related tasks but more problem in executive functions [25,[43][44][45][46]. However, the diagnosis of subcortical vascular dementia was not made on the basis of the amyloid imaging in our study because it is not routinely performed in clinical practice. Rather, clinical and radiologic criteria were used by clinicians as previously described [31].
Our study has several limitations. First, because it was a retrospective in design, absence of bias in patient selection could not be guaranteed. Second, the study was performed in a single center. However, it was conducted with a large cohort of 800 consecutive patients. Further multi-centered prospective studies will be required. Third, the cut-off values may depend on the scanner and the algorithms used for the automatic WMH. Fourth, our study MRI protocol used 2D FLAIR images with 4 mm section thickness, thus, there might be a variation for obtaining volume measurement. Further study using 3D FLAIR images might be needed. Fifth, we could not evaluate the competence of the already developed and commercially available segmentation model. However, Dice similarity coefficient score was 0.714 ± 0.149 (mean ± SD) with the median of 0.746 using our test set.

Conclusions
In conclusion, the deep learning-based automatic WMH segmentation algorithm may be an accurate and promising method for classifying the grades of the Fazekas scale and differentiating subcortical vascular dementia.